Learning to Match Names Across Languages

نویسندگان

  • Inderjeet Mani
  • Alex Yeh
  • Sherri L. Condon
چکیده

We report on research on matching names in different scripts across languages. We explore two trainable approaches based on comparing pronunciations. The first, a cross-lingual approach, uses an automatic name-matching program that exploits rules based on phonological comparisons of the two languages carried out by humans. The second, monolingual approach, relies only on automatic comparison of the phonological representations of each pair. Alignments produced by each approach are fed to a machine learning algorithm. Results show that the monolingual approach results in machine-learning based comparison of person-names in English and Chinese at an accuracy of over 97.0 F-measure.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A General Path-Based Representationfor Predicting Program Properties

Predicting program properties such as names or expression types has a wide range of applications. It can ease the task of programming, and increase programmer productivity. A major challenge when learning from programs is how to represent programs in a way that facilitates effective learning. We present a general path-based representation for learning from programs. Our representation is purely...

متن کامل

Learning of letter names follows similar principles across languages: Evidence from Hebrew.

Letter names play an important role in early literacy. Previous studies of letter name learning have examined the Latin alphabet. The current study tested learners of Hebrew, comparing their patterns of performance and types of errors with those of English learners. We analyzed letter-naming data from 645 Israeli children who had not begun formal reading instruction: a younger group (mean age 5...

متن کامل

سیستم شناسایی و طبقه بندی اسامی در متون فارسی

Name entity recognition (NER) is a system that can identify one or more kinds of names in a text and classify them into specified categories. These categories can be name of people, organizations, companies, places (country, city, street, etc.), time related to names (date and time), financial values, percentages, etc. Although during the past decade a lot of researches has been done on NER in ...

متن کامل

An Approach for Automatic Matching of Descriptive Addresses

Address matching (also called geocoding) is an applied spatial analysis which is frequently used in everyday life. Almost all desktop and web-based GIS environments are equipped with a module to match the addresses expressed in pre-defined standard formats on the map. It is an essential prerequisite for many of the functionalities provided by location-based services (e.g. car navigation). Sever...

متن کامل

Vernacular dominance in folk taxonomy: a case study of ethnospecies in medicinal plant trade in Tanzania

BACKGROUND Medicinal plants are traded as products with vernacular names, but these folk taxonomies do not always correspond one-to-one with scientific plant names. These local species entities can be defined as ethnospecies and can match, under-differentiate or over-differentiate as compared to scientific species. Identification of plant species in trade is further complicated by the processed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008